Skip to content

RoCEv2: Add DCQCN congestion control#1430

Open
ruck314 wants to merge 23 commits into
pre-releasefrom
dcqcnEn
Open

RoCEv2: Add DCQCN congestion control#1430
ruck314 wants to merge 23 commits into
pre-releasefrom
dcqcnEn

Conversation

@ruck314
Copy link
Copy Markdown
Contributor

@ruck314 ruck314 commented Jun 2, 2026

Summary

Adds DCQCN (Data Center Quantized Congestion Notification) congestion control to the RoCEv2 engine, plus the supporting IPv4/UDP and AXI-Stream changes needed to carry ECN/CNP signaling.

Originally developed on FilMarini/surf:dcqcnEn; migrated here and merged up to current pre-release (conflict-free).

Changes

DCQCN core (new modules under ethernet/RoCEv2/rtl/)

  • Dcqcn.vhd, AlphaUpdate.vhd, RateIncProc.vhd, RateDecProc.vhd — rate control state machine
  • TokenBucket.vhd, AxisBucket.vhd, TokenCalc.vhd — token-bucket rate limiter
  • AxiStreamMonDcqcn.vhd — rate/throughput monitoring
  • RoceEngineWrapper.vhd — wiring + Dcqcn_en enable parameter; CNP output port

ECN/CNP signaling path

  • IpV4Engine.vhd / IpV4EngineTx.vhd — DSCP + ECN fields in the IPv4 header
  • EthMacRxRoCEv2.vhd, RocePkg.vhd, UdpEngineWrapper.vhd — plumbing
  • Sgmii88E1111LvdsUltraScale.vhdRoceV2_en parameter for the RJ45 PHY path

blue-rdma regenmkQP.v, mkAxisTransportLayer.v, mkTransportLayer.v (QP destroy fix, RNR-retry-with-fragmentation fix, full 2-byte address, sequential pipeline for timing)

AXI-StreamAxiStreamCompact.vhd reworked for the congestion-control data path

Softwarepython/surf/ethernet/roce/_RoceEngine.py DCQCN RemoteVariable register map

Review notes

  • AxiStreamCompact.vhd is a full rewrite of an existing shared surf module (not a new file). This is the main item to scrutinize — confirm it preserves behavior for existing non-DCQCN users.
  • DCQCN is gated behind the Dcqcn_en generic; default-off behavior should be confirmed.

Status

Draft — pending FW review and register cross-check.

FilMarini and others added 23 commits March 20, 2026 10:20
# Conflicts:
#	axi/axi-stream/rtl/AxiStreamCompact.vhd
Correct indentation of generic/type/generic-map closing parens, move the
XBAR_CONFIG_C assignment operator, and rename local process variables to
camelCase (variable_004) with their colon alignment.
The rewritten AxiStreamCompact preserves payload data, so a full-keep beat
passes straight through. Update the expected output beat to the correct
tData (0x44332211) instead of the old zeroed value.
- AxiStreamCompact: replace local countKeepBytes() with onesCount() from
  StdRtlPkg
- AxiStreamMon: add optional frameUpdate output (registered onto statusClk
  when COMMON_CLK_G); delete duplicate ethernet/RoCEv2/rtl/AxiStreamMonDcqcn.vhd
  and switch TokenBucket caller to surf.AxiStreamMon
- RoceEngineWrapper: change mAxisMetaDataSlave default from
  AXI_STREAM_SLAVE_INIT_C to AXI_STREAM_SLAVE_FORCE_C
- RocePkg: rename SURF_DATA_STREAM_CONFIG_C to ROCEV2_AXIS_CONFIG_C
- Rename RoCEv2 RTL modules with RoCEv2 prefix: AlphaUpdate,
  AxisBucket, Dcqcn, RateDecProc, RateIncProc, TokenBucket, TokenCalc
- IpV4Engine, IpV4EngineTx, UdpEngineWrapper: rename DSCP_G -> ROCEV2_DSCP_G
  and ECN_G -> ROCEV2_ECN_G to mark the IPv4 traffic-class fields as
  RoCEv2-driven; no caller bindings to update
- python/surf/ethernet/roce: split _RoceEngine.py into _RoceEngine.py
  (RoceEngine) and _Dcqcn.py (Dcqcn), one PyRogue device per file
- Replace placeholder descriptions on Dcqcn.CnpCounter and
  Dcqcn.CnpCounterReset with descriptions of the rollover counter and its
  soft-reset behavior
DSCP and ECN are general IPv4 ToS/Traffic Class fields per RFC 2474 and
RFC 3168, not specific to RoCEv2. The ROCEV2_ prefix mislabeled
general-purpose IpV4Engine generics; revert the names.
@ruck314
Copy link
Copy Markdown
Contributor Author

ruck314 commented Jun 6, 2026

I tested this using slaclab/Simple-10GbE-RUDP-KCU105-Example#11 + rogue@pre-release (latest) and confirmed that it works

Simple-10GbE-RUDP-KCU105-Example/software$ python scripts/runDispatch.py
Rogue/pyrogue version v6.13.0-83-gf1f2b3a74. https://github.com/slaclab/rogue
Connected to Root at localhost:9099
INFO:dispatch:--- RoCEv2 connection parameters ---
INFO:dispatch:  ConnectionState : Connected
INFO:dispatch:  Host QPN        : 0x12
INFO:dispatch:  Host GID        : 0000:0000:0000:0000:0000:ffff:c0a8:0201
INFO:dispatch:  Host RQ PSN     : 0x1140ce
INFO:dispatch:  Host SQ PSN     : 0x736e9
INFO:dispatch:  MR addr         : 0x791bf84ca000
INFO:dispatch:  MR rkey         : 0x316
INFO:dispatch:  FPGA lkey       : 0x42508e2b
INFO:dispatch:  FPGA QPN        : 0x468bf
INFO:dispatch:  FPGA GID        : 0000:0000:0000:0000:0000:ffff:c0a8:020a
INFO:dispatch:  MaxPayload      : 9000
INFO:dispatch:  RxQueueDepth    : 256
INFO:dispatch:  MrLen           : 2304000
INFO:dispatch:  Payload (= MaxPayload) : 9000
INFO:dispatch:  AddrWrapCount   : 256
INFO:dispatch:------------------------------------
INFO:dispatch:Setting UDP engine destination to 192.168.2.1:4791
INFO:dispatch:Resetting counters...
INFO:dispatch:Dispatching 1 packet(s) of 9000 bytes...
INFO:dispatch:Correctly received 1 / 1 packet(s)

@ruck314 ruck314 marked this pull request as ready for review June 6, 2026 22:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants